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1  INTRODUCTION 

«. 

"  This  project  is  concerned  with  the  optimisation  of  objective  functions 
F(x)  in  a  large  dimensional  space  R"  on  highly  parallel  computers. 

It  has  been  established  that  the  truncated  Nevton  method  introduced  by 
Dembo  &  Steihang  [1]  is  an  efficient  method  for  solving  large  optimisation 
algorithms  on  a  sequential  machine,  Dixon  &  Price  [2]1  The  truncated 
Nevton  method  consists  of  two  main  steps 

(i)  the  calculation  of  the  function  value  F(x),  gradient  vector  g(x) 
and  Hessian  matrix  B(x)  at  a  sequence  of  points 
'  (41^'  solving  the  set  of  linear  equations 
H(x)  d  «  -  g(x) 

approximately  for  the  search  direction  d. 

It  has  been  shown  Dixon  &  Hohsenina  [3]  that  the  calculation  of  the 
gradient  vector  and  Hessian  matrix  B(x)  can  be  undertaken  very  elegantly  in 
ADA  using  automatic  differentiation,  Rail  (4].  In  ADA  it  is  only  necessary 
to  program  the  objective  function  in  the  normal  way,  then  to  declare  the 
variables  to  be  of  type  "triplet"  and  to  use  an  extended  definition  of  the 
arithmetic  operators  to  obtain  the  gradient  and  Hessian.  All  the 

necessary  software  for  triplet  and  sparse  triplet  arithmetic  has  now  been 
written  and  tested.  Sparse  triplet  arithmetic  generates  the  Hessian  in  a 
standard  sparse  form  convenient  for  large  sparse  matrices. 

In  contrast  automatic  differentiation  in  Fortran  is  messy  as  each 
arithmetic  operation  in  the  calculation  of  the  function  must  be  replaced  by  ^ 
a  subroutine  call  to  perform  the  triplet  arithmetic.  This  is  not  a  major  ^ 
problem  for  the  simple  test  functions  used  in  the  tests  reported  herein, 
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but  would  be  inpractical  on  realistic  industrial  problems. 

Using  automatic  differentiation  in  ADA  should  remove  the  difficult 
task  of  ensuring  that  the  gradient  calculation  is  coded  correctly  and  also 
the  need  to  estimate  a  suitable  step  with  which  to  approximate  the  Hessian 
by  differences  which  is  the  common  practise. 

It  is  anticipated  that  when  the  triplet  arithmetic  is  declared  to  be 
concurrent  tasks  then  the  ADA  version  should  be  efficient  on  highly 
parallel  computers. 

In  the  truncated  Newton  method  the  set  of  equations 

B(x)  d  =  -  g(x) 

is  usually  approached  by  applying  the  conjugate  direction  algorithm.  This 
is  an  iterative  method  that  decreases  the  quadratic  approximation  to  P 
Q(x,U)  -  F(x)  +  g*U  +  1/2  U^eU 
and  increases  |]U||  at  each  inner  iteration. 

The  kth  inner  iteration  is  terminated  when  either 

(i)  ||g<x+U)l|  <  ||g(x)||  min  (0.1/k,  ||g(x)||) 

or  (ii)  null  >  D. 

Again  the  conjugate  direction  method  which  consists  of  updating 
vectors  can  readily  be  posed  as  in  concurrent  tasks. 


Evidence  exists  Dixon  &  Nohsenina  [3]  that  the  introduction  of  an 
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incomplete  Choleski  decomposition  of  the  matrix  H(x),  and  performing  the 
conjugate  gradient  algorithm  in  the  scaled  space  could  be  beneficial  on 
simple  problems. 

The  functions  reported  in  (5]  were  in  a  sense  special  as  they  were 
both  modifications  of  simple  low  dimensional  problems,  that  had  been 
extended  in  such  a  way  that  the  number  of  distinct  eigenvalues  remained  low 
(which  favoured  the  conjugate  gradient  method)  and  had  a  small  band  width 
(which  favoured  a  full  choleski  decomposition). 

In  wishing  to  test  these  ideas  further  one  of  the  first  requirements 
was  to  define  a  more  general  set  of  simple  test  functions  that  led  to 
sparse  Hessians  that  could  be  made  arbitrarily  illconditioned.  The  set 
chosen  are  defined  in  Appendix  1.  The  Hessians  are  very  sparse  having  the 
familiar  diagonal  band  structure  but  with  some  non  zero  diagonals  far  from 
centre.  The  functions  can  be  made  more  illconditioned  by  increasing  the 
power  of  the  (i/n)'  coefficient,  to  date  tests  have  been  performed  with 
r  •>  0,  1  and  2. 

2  NUMERICAL  RESULTS 

The  results  of  the  tests  using  the  TNCG  code  in  Fortran  are  given  in 
Appendix  2.  Host  of  these  were  terminated  when  ||g||  <  10~‘,  though  for 
some  illconditioned  problems  it  was  necessary  to  continue  until  ||g||  < 
10'^°  to  obtain  a  good  approximate  result.  All  the  tests  were  successful 
the  algorithm  behaving  as  expected.  The  results  on  the  large  dimensional 
illconditioned  problems  were  very  expensive  in  terms  of  computer  time  and 
confirm  the  need  to  improve  the  algorithm  in  these  cases. 
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The  same  tests  were  commenced  using  ADA,  it  was  found  as  expected  that 
the  same  number  of  outer  and  inner  iterations  was  usually  required,  though 
on  occasions  one  extra  or  fewer  outer  iteration  was  performed.  However  the 
test  series  could  not  be  completed  because  the  ratio  of  the  CPU  times  in 
ADA  compared  with  FORTRAN  increased  rapidly  with  n.  Theoretically  the 
ratio  was  expected  to  remain  constant.  These  results  which  are  shown  in 
Appendix  3  seem  to  indicate  that  either 

(1}  the  ADA  implementation  is  poor 

(2)  our  ADA  compiler  is  poor 

or  (3)  ADA  is  not  suited  to  high  dimensional  problems  of  this  type. 

This  result  was  unexpected  and  is  being  investigated  further. 

Initial  investigation  using  the  incomplete  Choleski  code  on  large 
problems  have  indicated  that  its  performance  can  be  poor  if  the 
decomposition  attempts  to  introduce  a  large  number  of  negative  diagonal 
elements.  It  is  intended  to  introduce  Papadrakakis's  safeguards  [6]  to 
overcome  this  problem  to  determine  whether  the  good  results  be  obtained  or 
some  special  problems  will  be  repeated  on  this  very  different  test  set. 
However  his  safeguards  are  heuristic  for  general  problems  and  it  seemed 
sensible  to  investigate  a  more  theoretical  approach. 

3  THEORETICAL  STUDY 

Let  us  assume  that  the  Hessian  matrix  is  A  and  that  R  is  an 
approximation  to  the  inverse. 


The  iterative  scheme 

X''*^  -  (I  -  RA)  x”  +  Rb  (2) 

converges  to  the  solution  x*,  if  ||I-RA||  <  1. 

This  is  easily  seen  if  ve  write  it  as 

-  X*  «  (x**  -  X*)  4  R  (Ax*  -  Ax'") 
if  now  e  is  the  error 
e'“*^  =  (I  -  RA)  e^ 
lle-^Ml  <  III  -  RAM  |le’‘||. 

The  performance  will  improve  in  terms  of  iterations  as  ||I  -  RA||  is 
decreased;  but  any  matrix  R  for  which  ||1  -  RA| |  <  0.5  will  lead  to  a 
reasonable  speed  of  convergence.  So  we  might  wish  to  find  a  sparse  matrix 
R  tor  which  N  b  ||I  -  RA| |  <  0.5  for  some  norm.  Ve  note  that  if  «  0 
then  N  B  1  and  if  R  »  A~^  then  N  -  0.  It  should  therefore  be  possible  by 
introducing  the  variables  one  at  a  time  to  find  a  sparse  matrix  R  with  the 
required  property.  The  interaction  (2)  consisting  as  it  does  of  sparse 
matrix  vector  multiplications  should  be  efficient  on  a  highly  parallel 
computer. 

There  are  of  course  a  number  of  norms  that  can  be  considered. 
Introducing  B  ■  I  -  RA,  ||B||^  «  max  eigenvalue  B^B,  noting  that  b'^B  is 
symmetric  and  positive  semi  definite. 

T  -  Trace  (b’^B)  >  1  |B|  1* 

so  we  could  select  R^^  to  reduce  T  »  trace  (B’^B),  T  is  of  course  a 
quadratic  in  R . . 
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(B’b),.  .  I  B^. 
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We  can  therefore  introduce  elements  R^,j  in  the  row  R^  to  reduce  .  This 
is  a  smooth  quadratic  function  so  the  introduction  of  each  variable  must 
reduce  T^ 

Viz  R^^  -  0  all  k  -  1 
introduce  R^ .  »  ./£  A^.  T.  *  1  -  A^  /£  A^. 
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The  method  must  converge  with  T^  «  0  in  n  steps  at  most  if  at  each 
iteration  the  parameter  R^^  are  chosen  to  minimise  T^  over  the  spanned 
subspace.  This  method  will  be  investigated  further.  It  is  however  not 
clear  from  the  above  approach  when  the  process  should  be  terminated. 
However  if  instead  we  look  at  the  ((B|t.  norm 


|B| L  ■  max  £  |B . 


max  U. 


-  £  |B 
m 


j.i  - 1 1«).  -  5»)k  *..i 

n  R 


We  note  that  again  the  elements  of  each  row  of  R  only  effect  one 
subfunction  so  the  introduction  of  elements  into  the  row  of  R  may 
terminate  when  <  1/2. 


One  approach  would  therefore  consist  of  introduce  the  variable  R^^  one 
at  a  time  to  minimise  T^  terminating  when  <  1/2.  This  naturally  raises 
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the  possibility  of  minimising  rather  than  T^, .  is  a  non 
differentiable  function.  Its  minimisation  can  be  posed  as  a  linear 
programming  problem.  It  is  however  degenerate  and  it  is  not  necessarily 
possible  in  the  simplex  method  to  introduce  the  variables  one  at  a 

time  and  reduce  ,  as  the  degeneracy  of 

Min  E  U|^  +  v^ 
m 

\  -  0 

s.t  Uj^  >0,  v^  >0,  X*  >0  and  x”  >  0. 
implies  that  many  zero  steps  may  be  taken.  These  and  similar  methods 
involving  incomplete  Cholesky  factors  will  be  investigated  further. 

4  CONCLUSIONS 

(1)  A  set  of  test  functions  have  been  defined. 

(2)  Fortran  and  Ada  Implementations  of  the  Truncated  Newton/Conjugate 
gradient  algorithm  have  been  implemented  using  automatic 
differentiations. 

(3)  The  Fortran  results  were  as  expected  but  the  ADA  results 
deteriorated  as  n  increased. 

(4)  Incomplete  Choleski  versions  of  both  are  almost  complete. 

(5)  Methods  for  finding  sparse  approximate  inverse  Hessians  have  been 
proposed. 
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and  f(x*)  =  1.0. 

Twelve  cases  were  investigated  and  they  are  classified  according  to  the 
values  of  a^ ,  ,  c^  and  d^  as  shown  in  the  following  table 


Case  No. 

bi 

1 

1.0 

0.0 

0.125 

0.125 

2 

1.0 

0.0625 

0.0625 

0.0625 

3 

1.0 

0.125 

0.125 

0.125 

4 

1.0 

0.26 

0.26 

0.26 

5 

i/N 

0.0 

0.125 

0.125  i/N 

6 

i/N 

0.0625 

0.0625 

0.0625  i/N 

7 

i/N 

0.125 

0.125 

0.125  i/N 

8 

i/N 

0.26 

0.26 

0.26  i/N 

9 

i^/N^ 

0.0 

0.125 

0.125  iVN* 

10 

i^N^ 

0.0625 

0.0625 

0.0625  iVN 

11 

i^N^ 

0.125 

0.125 

0.125  i^N^ 

12 

i^N^ 

0.26 

0.26 

0.26  iVn^ 

.  JJ'AiWJW.Vf  W,U.iJ(WWi  »J,  IJJIJ.  ^il  IJJ  JIJI  U.I  WJi 
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APPENDIX  2 

This  appendix  contains  the  results  of  the  TNCG  codes  using  both  Ada 
and  FORTRAN.  Table  1  contains  the  results  using  Ada  for  the  12  test  cases 
and  N  «  15,30,60,90  and  120.  For  each  case  the  result  is  given  in  2  rows. 
The  first  row  contains  no.  of  function  calls/no.  of  major  iterations/no.  of 
minor  interations.  The  second  row  contains  the  CPU  time/in  seconds.  The 
computer  used  to  run  all  the  Ada  test  runs  is  VAX  11/785  using  VMS  V  4.5. 

Table  2  contains  the  results  for  the  same  runs  as  in  table  1  but  using 
FORTRAN.  The  computer  used  for  the  FORTRAN  runs  is  VAX  8650  using  VMS 
V  4.6. 


From  tables  1  and  2  it  can  be  seen  that  for  most  of  the  cases  tested 
both  Ada  and  FORTRAN  codes  took  the  same  number  of  function  calls,  major 
iterations  and  minor  iterations.  The  few  cases  where  there  were 
discrepancies  between  Ada  and  FORTRAN  are  marked  with  an  *  in  both  tables. 

The  stopping  rule  used  in  the  above  runs  was  | |g| |  <  10"®  where  g  is 
the  gradient  vector.  The  CPU  time  for  Ada  was  large  compared  with  that  of 
FORTRAN.  We  then  decided  to  run  the  FORTRAN  code  for  larger  values  of  N. 
The  values  Nb300,  1500  and  3000  were  used.  The  results  for  these  runs  are 
given  in  table  3.  In  this  table  the  output  of  each  run  is  given  in  4 
lines.  The  first  line  contains  no.  of  function  calls/no.  of  major 
iteration/no.  of  minor  iterations.  The  second  line  contains  Z  |x^|  at  the 
final  point.  The  third  line  contains  the  norm  of  the  gradient  vector  at  the 
final  point.  The  last  line  contains  the  CPU  time  in  seconds. 

The  stopping  rule  used  was  | |g( {  >  10~®.  In  cases  9-12  for  large  N 
£|x^|  is  not  near  zero  as  expected.  For  example  £|Xj|  -  0.9294  for  case  12 
with  N=1500  and  E|xJ  .  1.3238  for  case  10  with  N  .=  3000. 

To  investigate  these  results  we  decided  to  repeat  the  same  test  with 
accuracy  of  10"^**  instead  of  10”*.  Using  10”*°  as  the  required  accuracy 
the  above  problem  disappeared  but  the  CPU  time  increases.  The  results  for 
both  10”*  and  10”*°  accuracy  are  given  in  table  3. 


15 


30 


60 


90 


120 


Case  1 

8/6/10 

3.07 

9/6/10 

9.47 

9/6/10 

31.22 

10/7/10 

79.47 

10/7/10 

142.36 

Case  2 

10/7/11 

6.55 

11/7/11 

19.67 

11/7/10 

68.27 

12/8/11* 

174.88 

12/8/11* 

311.81 

Case  3 

11/7/12 

6.62 

11/7/12 

19.32 

12/8/14 

78.46 

11/7/11 

156.99 

11/7/11 

279.76 

Case  4 

15/9/23 

8.43 

16/10/32 

27.16 

13/8/13 

79.72 

13/8/13 

178.48 

13/8/13 

318.09 

Case  5 

12/8/36 

4.08 

11/8/59 

12.25 

12/8/69 

41.99 

11/8/68 

92.68 

11/8/79 

166.43 

Case  6 

12/8/37 

7.32 

12/8/46 

22.85 

13/9/66 

88.68 

18/12/136 

259.50 

13/9/75 

349.08 

Case  7 

14/9/46 

8.38 

14/9/59 

25.21 

18/11/83 

111.01 

15/10/96 

218.69 

19/12/96 

468.09 

Case  8 

18/10/43 

9.67 

17/11/52 

30.54 

18/11/59 

109.88 

35/17/159 

398.44 

21/13/141 

494.50 

Case  9 

11/8/50 

4.14 

11/8/87 

12.49 

12/9/200 

51.84 

12/9/270 

109.10 

12/9/353 

195.16 

Case  10 

15/11/72 

9.83 

15/10/118 

28.82 

17/12/289 

125.95 

28/15/436 

359.75 

43/21/880* 

894.51 

Case  11 

15/10/70 

9.31 

21/12/151 

36.05 

32/17/378* 

194.25 

30/17/491* 

393.29 

24/13/417 

536.53 

Case  12 

_ 

18/10/59 

9.64 

_ 

22/13/144 

37.75 

32/15/298 

176.52 

31/17/487 

402.21 

40/20/569* 

834.23 

.  -  -  >  j  FV |^<  <xt  gr>j  H_»  y_^  1^,  1^.  ,y 
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N 

15 

30 

60 

90 

120 

Case  1 

8/6/10 

0.26 

9/6/10 

0.55 

9/6/10 

0.92 

10/7/10 

1.53 

10/7/10 

2.02 

Case  2 

10/7/11 

0.44 

11/7/11 

0.92 

11/7/10 

1.77 

11/7/8* 

2.71 

11/7/8* 

3.61 

Case  3 

11/7/12 

0.44 

11/7/12 

0.92 

12/8/14 

2.00 

11/7/11 

2.78 

11/7/11 

3.66 

Case  4 

15/9/23 

0.64 

16/10/32 

1.42 

13/8/13 

2-09 

13/8/13 

3.04 

13/8/13 

4.02 

Case  S 

12/8/36 

0.38 

11/8/59 

0.90 

12/8/69 

1.68 

11/8/68 

2.57 

11/8/79 

3.51 

Case  6 

12/8/37 

0.60 

12/8/46 

1.14 

13/9/66 

2.85 

18/12/136 

6.44 

13/9/75 

5.99 

Case  7 

14/9/46 

0.72 

14/9/59 

1.48 

18/11/83 

3.65 

15/10/96 

5.06 

19/12/96 

7.85 

Case  6 

18/10/43 

0.76 

17/11/52 

1.58 

18/11/59 

3.30 

35/17/159 

9.14 

21/13/141 

9.57 

Case  9 

11/8/50 

0.40 

11/8/87 

0.96 

12/9/200 

2.88 

12/9/268 

5.38 

12/9/351 

8.64 

Case  10 

15/11/72 

0.86 

15/10/118 

1.93 

17/12/287 

5.98 

28/15/433 

12.83 

40/20/762* 

27.03 

Case  11 

15/10/70 

0.84 

21/12/151 

2.34 

26/15/371* 

7.98 

29/16/383* 

12.26 

24/13/415 

15.52 

Case  12 

18/10/59 

0.82 

22/13/144 

2.28 

32/15/286 

7.27 

31/17/473 

14.19 

50/24/904* 

32.05 

Table  2  Results  of  TN06  using  FORTRAN 
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N 

300 

1500 

3000 

Accuracy 

io-‘ 

10-” 

10“‘ 

10““ 

10“  * 

10““ 

10/6/9 

11/7/10 

13/8/11 

Case  1 

4.4  10"^* 
2.7  10~’® 
4.52 

same 

4.56 

5.3  10““ 
6.8  10““ 
25.46 

same 

25.41 

4.2  io:“ 
2.5  10  “ 
57.32 

same 

57.36 

11/7/8 

12/8/11 

14/9/12 

14/9/12 

Case  2 

1.9  10“® 
2.0  10"’ 
8.81 

2.5  10"” 
2.9  10““ 
9.90 

1.5  10““ 
2.1  10““ 
56.57 

same 

58.70 

2.6  10““ 
4.7  10““ 
106.46 

same 

113.37 

12/8/12 

13/8/10 

14/9/13 

14/9/13 

Case  3 

2.5  10~“ 
5.9  10"^^ 
10.19 

sane 

9.88 

4.0  10“® 
1.8  10“’ 
50.75 

2.0  10““ 
1.6  10““ 
55.97 

5.1  10““ 

1.2  10““ 
109.71 

same 

113.07 

13/8/12 

14/9/15 

15/9/13 

15/9/13 

16/10/16 

Case  4 

5.6  10'* 
1.0  10'* 
10.35 

1.3  10““ 

2.3  10"“ 
11.26 

1.5  10““ 
2.9  10““ 
57.50 

same 

57.14 

1.4  10“’ 
1.9  10“* 
113.29 

1.5  lO:” 

1.9  10  “ 
125.92 

12/8/136 

14/9/231 

13/9/274 

15/10/484 

14/9/240 

15/10/473 

Case  5 

5.1  10'’’ 

1.2  10’* 
11.65 

1.7  10:“ 
9.4  10  “ 
16.11 

3.6  10“* 
1.3  10“* 
91.68 

2.8  10““ 
1.8  10““ 
140.88 

2.0  10““ 
3.9  10“’ 
164.55 

3.9  10“*, 

1.3  10““ 
275.97 

19/12/214 

■1 

37/19/604 

38/20/779 

50/21/814 

60/22/1450 

Case  6 

2.6  io;J 
6.1  10  “ 
26.08 

7.0  10“* 
1.6  10“* 
288.49 

6.7  io:“ 
4.6  10  “ 
333.86 

6.5  10“’ 

1.5  10“’ 
699.62 

6.5  10“’ 

1.5  10“’ 
1084.94 

20/13/177 

21/14/247 

37/16/453 

38/17.592 

42/19/764 

43/20/983 

Case  7 

1.2  10"^ 
9.2  10"’ 
24.59 

_ 

6.9  10"“ 
2.7  10““ 
29.63 

6.1  10“* 
7.0  10“’ 
232.45 

5.9  10“ 
8.7  10  “ 
272.12 

1.7  10“! 
2.3  10“’ 
647.87 

1.0  10““ 

1.0  10““ 
800.54 

>.v  JV  JX^VA'AV^’ATA'IS-^  WJVLH\VV.v^^  V\a.'A-V.s.\  VL^,\  HJ 

»AV'ri»_'r»!rvwv«fv  m.  ^  w<- 


37/16/233  38/17/327 


33/17/523 


43/18/1051 


38/19/713 


48/20/1353 


Case 

8 

3.5  10'* 
9.8  10‘* 
33.98 

3.3  10"’® 
1.5  10"“ 
40.86 

7.5  10'* 

1.8  10~’® 
253.96 

7.5  10~* 

1.8  10'’® 
385.80 

1.0  10"® 

9.5  10'” 
608.76 

1.0  10"® 

9.5  10"” 
983.55 

14/9/835 

15/10/1265 

14/10/3340 

15/11/5614 

14/9/4799 

15/10/9409 

Case 

9 

2.1  10"® 
6.4  10”* 
42.80 

6.9  10“|* 
1.3  10"” 
62.51 

1.6  10'* 

4.6  10~ 
793.80 

8.5  10'” 
3.0  10'” 
1291.51 

5.3  10"* 

2.3  10"’ 
2151.60 

8.9  10'“ 

2.8  10"” 
4276.02 

37/20/1995 

38/21/2395 

58/31/7008 

61/34/15738 

65/32/3216 

70/37/21944 

Case 

10 

3.9  10"* 
1.6  10"’ 
132.56 

1.0  io:‘ 
5.6  10  “ 
154.44 

0.066  , 

3.2  10'’ 
2153.41 

2.8  10  ” 
4393.73 

1.3228  , 
9.5  10"’ 
2095.39 

1.9  10:; 

2.4  10  ” 
11942.35 

51/24/1666 

53/26/2393 

80/35/3692 

86/40/12870 

69/32/2251 

77/38/29467 

Case 

11 

6.6  10"^ 
4.2  10"’ 
121.44 

2.0  10"* 
1.4  10"” 
161.36 

0.4611  , 

9.4  10' 
1263.18 

3.9  10  ” 
3638.44 

1.4219 

6.9  10"’ 
1570.84 

5.1  10'® 

2.2  10'” 
15189.91 

53/26/1596 

54/27/2015 

80/37/2435 

97/48/13019 

81/22/3880 

39/64/24144 

Case 

12 

2.3  10"* 
8.8  10~* 
120.09 

4.8  10"|® 

2.8  10"” 
141.73 

0.9294  , 

1.0  10'* 
908.40 

3.4  10:; 

3.0  10  ” 
3790.82 

1.1997  , 
7.3  10"’ 
2516.05 

2.0  10'* 

1.9  10"” 
13462.14 
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APPENDIX  3 

The  tables  of  results  in  Appendix  2  indicate  that  among  the  first  four 
Cases,  Case  4  is  harder  than  Case  3  which  in  turn  is  harder  than  Case  2, 
while  Case  2  is  harder  than  Case  1.  The  same  seems  also  true  for  Cases 
8,7,6  and  5  and  for  the  last  set  of  Case  12,11,10  and  9.  These  results 
were  expected  when  we  constructed  the  test  problems.  Ve  decided  to  study 
the  performance  of  Case  1,4, 5, 8, 9  and  12  in  more  detail. 

In  figure  1  CPU  time  per  major  iterations,  is  plotted  against  N  when 
using  the  FORTRAN  code,  for  N  in  the  range  (15-120). 

Figure  2  shows  the  plot  for  N  in  the  range  (15-3000).  Both  these 
figures  shows  that  CPU  time/major  Iterations  is  linear  in  N  for  Case  1,4,5 
and  8,  while  for  Cases  9  and  12  the  function  is  non-linear.  The  same  plot 
was  repeated  for  the  Ada  Code  and  the  result  is  given  in  figure  3.  In  this 
figure  none  of  the  cases  is  a  linear  relationship.  It  must  be  stressed 
that  the  figures  1  and  3  the  scale  of  CPU  time/major  iterations  is  not  the 
same  since  the  FORTRAN  Code  is  much  faster  than  the  Ada  Code.  Figure  4 
shows  CPU  time/major  iterations  plotted  against  N  for  both  the  FORTRAN  and 
Ada  runs.  The  FORTRAN  and  the  CPU  times  were  multiplied  by  10  in  this 
figure. 

For  the  FORTRAN  Code  it  seems  that  the  relation  is  not  linear  because 
we  have  ignored  the  effect  of  the  changing  number  of  minor  iterations.  For 
cases  1-4  the  ratio  minor  iteration/major  iteration  is  less  than  3,  for 
cases  5-8  this  ratio  becomes  as  large  as  30,  while  for  cases  9-12  this 
ratio  reaches  more  than  500.  In  cases  9-12  the  minor  iterations  play  an 
important  role  which  can  not  be  ignored. 

As  for  this  problem,  on  average,  one  major  iteration  costs  the  same  as 
about  30  minor  iterations.  Using  equivalent  iteration  ■  major  iteration  •»- 
minor  iterations/30  we  draw  CPU  time/equivalent  iterations  against  N. 
Figures  5  and  6  show  these  plots  for  the  FORTRAN  Codes.  These  figures  show 
that  the  relation  is  now  virtually  linear.  The  same  plot  for  Ada  is  given 
in  figure  7  which  indicates  that  none  of  the  cases  are  linear.  Again  the 
scale  of  CPU  time/equivalent  iteration  used  in  figures  5  and  7  is  not  the 
same  since  the  FORTRAN  Code  was  much  faster  than  the  Ada  one.  To  show  the 
relative  time  between  FORTRAN  and  Ada,  figure  8  contains  the  plots  for  both 

I 


. . . . . 


FORTRAN  and  Ada.  In  this  figure  the  FORTRAN  CPU  time  is  multiplied  by  10 
to  separate  the  different  cases. 

These  figures  show  that  the  performance  of  the  Ada  Code  needs  more 
investigation  as  it  is  obviously  dominated  by  the  non-linear  factor  that  is 
absent  in  the  FORTRAN  implementation. 
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Figure  6  CPU  time/equivalent  iteration  "FORTRAN  Code/large  N 
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Figure  8  CPU  time/equivalent  iteration  "FORTRAN  and  Ada  Codes 
p.s.  for  FORTRAN  Code  the  CPU  was  multiplied  by  10. 


